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Method for training a neu ral network 
Description 



1. FIELD OF THE INVENTION 



The invention relates to a method for training a 
neural network to determine risk functions for patients 
following a first occurrence of a predetermined disease 
on the basis of given training data records containing 
obj ectif iable and for the most part metrologically 
captured data relating to the medical condition of the 
patient, wherein the neural network comprises an input 
layer having a plurality of input neurons and at least 
one intermediate layer having a plurality of intermediate 
neurons, as well as an output layer having a plurality of 
output neurons, and a multiplicity of synapses which 
interconnect two neurons of different layers in each 
case . 



2, TECHNICAL BACKGROUND - PRIOR ART 

2 > 1 » General 

For large scale data analysis, neural networks 
have supplemented or replaced hitherto conventional 
methods of analysis in many fields. It has namely been 
shown that neural networks are better than conventional 
methods at discovering and identifying in the datasets 
hidden, not immediately evident dependencies between 
individual input data. When new data of the same data 
type is input, neural networks which have been trained 
using a known dataset therefore deliver more reliable 
results than previous methods of analysis. 

In the field of medical applications for example, 
the use of neural networks to determine a survival 
function for patients suffering from a particular 
disease, such as cancer, is known. Said survival function 
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indicates the probability of a predetermined event 
occurring for. the patient in question depending on the 
time that has elapsed since the first occurrence of the 
disease. Said predetermined event need not necessarily be 
the death of the patient, as would be inferred from the 
designation "survival function", but may be any event, 
for example a recurrence of cancer. 

The data records comprise a whole range of 
obj ectif iable information, that is to say data on whose 
value any neural network operator has no influence and 
whose value can be automatically captured if desired. In 
the case of breast cancer this is information about the 
patient's personal data, such as age, sex and the like, 
information about the medical condition, such as number 
of lymph nodes affected by cancer, biological tumor 
factors such as upA (Urokinase Plasminogen Activator) , 
its inhibitor PAI-1 and similar factors, as well as 
infoarmation about the treatment method, for example type, 
duration and intensity of chemotherapy or radiotherapy. 
It goes without saying that a whole range of the 
abovementioned information, in particular the information 
about the medical condition, can only be deteirmined using 
suitable measuring apparatus. Furthermore, the personal 
data can be automatically read in from suitable data 
media, for example machine -readable identity cards or the 
like. If they are not all available at the same time, 
which is often the case especially with laboratory 
measurements, the obj ectif iable data can of course be 
temporarily stored in a database on a suitable storage 
medium before they are fed to the neural network as input 
data. 

2,2, The neural network as signal filter 

In accordance with the foregoing, therefore, it 
is possible to conceive of a neural network as a kind of 
"signal filter" that filters out a meaningful output 
signal from a noisy, and therefore as yet non-meaningful 
input signal. As with any filter, whether or how well the 
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filter is able to fulfill its function depends on whether 
it is possible to keep the intensity of the filter's 
intrinsic noise low enough that the signal to be filtered 
out is not lost in this intrinsic noise. 

The greater the number of data records available 
for training the neural network on the one hand and the 
simpler the structure of the neural network on the other 
hand, the lower the intensity of the "intrinsic noise" of 
a neural network. Moreover, the generalizability of the 
network increases, the simpler the structure of the 
neural network. In the case of a conventional procedure 
in the prior art, therefore, one part of the training of 
neural networks is concerned with locating and 
eliminating parts of the structure that can be dispensed 
with for obtaining a meaningful output signal. With this 
"thinning out" (also known as "pruning" in the jargon) 
however, a further constraint to be taken into account is 
that the structure of the neural network cannot be 
"pruned" ad infinitum because as the complexity of the 
neural network is reduced, its ability to map complex 
interrelationships, and hence its meaningf ulness , is also 
diminished. 



2>3> Problems with, medical application 

In practice, eind in particular in the case of the 
medical application of neural networks mentioned at the 
beginning, the problem is often encountered that only 
very small datasets of typically a few hundred data 
records are available for training the neural network. To 
compound the difficulty, not only a training dataset, but 
also a validation dataset and a generalization dataset 
must be provided for the training. The significance of 
said two datasets will be discussed in greater detail 
below in sections 5.5 and 5.7. 

With such small datasets, the use ' of known 
pruning methods always led to so great a simplification 
of the structure of the neural network that the 
meaningf ulness of the neural network diminished to an 
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unacceptable level. To nevertheless obtain neural 
networks that delivered meaningful output signals after 
completion of the training phase, in the prior art neural 
networks with a rigid, that is to say fixed and 
invariable, structure were used where only small training 
datasets were available. The degree of complexity, or the 
simplicity, of this rigid structure was selected here on 
the basis of empirical knowledge in such a way that the 
neural network had on the one hand a high degree of 
meaningfulness while on the other hand having a still 
acceptable intrinsic noise level. It has hitherto been 
assumed that the specification of an invariable structure 
was unavoidable. 

Another problem with medical applications of 
neural networks is the fact that only "censored" data are 
available for training. The term "censored" is used to 
denote the circumstance that it is not possible to 
foresee the future development for patients who have 
fortunately not yet suffered a relapse at the time of 
data capture, and statements about the survival function 
are therefore only possible up until the time the data 
were recorded. 

It goes without saying that in particular in the 
case of medical applications it is not possible to forego 
a truly meaningful result under any circumstances 
whatsoever. Under no circumstances is it namely 
acceptable for even one single patient to be denied a 
treatment simply because the neural network did not 
consider it necessary. The consequences for the patient 
could be incalculable. 

With respect to the details of the prior art 
outlined above, please see the articles listed in section 
6 . "References" . 

3, OBJECT OF THE INVENTION 

In the light of the above, the object of the 
invention is to provide an automatic method for training 
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a neural network to determine risk functions for patients 
following a first occurrence of a predetermined disease, 
which method permits, despite a low number of available 
training data records, the use of a neural network having 
a variable structure and the optimization of its 
structure in at least one structure simplification step. 

4. ACHIEVEMENT OF THE OBiJKCT 

According to the invention, this object is 
achieved by a method for training a neural network to 
determine risk functions for patients following a first 
occurrence of a predetermined disease on the basis of 
given training data records containing obj ectif iable and 
metrologically captured data relating to the medical 
condition of the patient, wherein the neural network 
comprises : 

an input layer having a plurality of input neurons, 
at least one intermediate layer having a plurality 
of intermediate neurons, 

an output layer having a plurality of output 
neurons , and 

a multiplicity of synapses which interconnect two 
neurons of different layers in each case, 
wherein the training of the neural network comprises a 
structure simplification procedure, that is to say the 
location and elimination of synapses that have no 
significant influence on the curve of the risk function, 
in that one either 

al) selects two sending neurons that are connected to 
one and the same receiving neuron, 

a2) assumes that the signals output from said sending 
neurons to the receiving neuron essentially exhibit 
the same qualitative behavior, that is to say are 
correlated to one another, 

a3) interrupts the synapse of one of the two sending 
neurons to the receiving neuron and instead adapts 
accordingly the weight of the synapse of the 
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respective other sending neuron to- the receiving 
neuron , 

a4) compares the reaction of the neural network changed 

in accordance with step a3) with the reaction of the 

unchanged neural network, and 
a5) if the variation of the reaction does not exceed a 

predetermined level, decides to keep the change made 

in step a3 ) , 
or in that one 
bl) selects a synapse, 

b2) assumes that said synapse does not have a 
significant influence on the curve of the risk 
function, 

b3) interrupts said synapse, 

b4) compares the reaction of the neural network changed 

in accordance with step b3) with the reaction of the 

unchanged neural network, and 
b5) if the variation of the reaction does not exceed a 

predetermined level, decides to keep the change made 

in step b3) . 

A neural network trained in the manner described 
above assists the attending physician for example when 
deciding on the follow-up treatment for a particular 
newly operated patient. For this the physician can input 
into the neural network the patient data and the data 
metrologically captured in the laboratory relating to the 
medical condition of the first treatment, and receives 
from the neural network information about what type of 
follow-up treatment would produce the most favorable 
survival function for the patient in question. It is of 
course also possible to take account of the 
aggressiveness of the individual types of follow-up 
treatment so that, given an equally favorable or 
virtually equally favorable survival function, the least 
aggressive follow-up treatment for the patient can be 
selected. 
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5, EXEMPLARY EMBODIMENT 

The invention is explained in greater detail 
below with reference to an exemplary embodiment. 

5,1, Structure of neural networks 

Fig. 1 shows the structure of a neural network 
which is constructed in the manner of a multi-layer 
perceptron. In this case the neural network comprises: 

an input layer having a plurality of input neurons 

Ni (i for "input neuron"), 

at least one intermediate layer having a plurality 
of intermediate neurons Nh (h for "hidden neuron"), 
an output layer having a plurality of output neurons 
No (o for "output neuron"), and 

a multiplicity of synapses which interconnect two 
neurons of different layers in each case. 

In the simplified embodiment according to Fig, 1, 
on which the following discussion will be based for the 
sake of clarity, only a single intermediate layer is 
provided, and the neurons (or nodes as they are also 
frequently called) of the output layer are connected via 
synapses (also called "connectors") to both each neuron 
of the input layer and to each neuron of the intermediate 
layer . 

The number of input neurons is usually chosen 
depending on the number of objectif iable items of 
information available. However, if the time required for 
determining the reaction of the neural network should 
consequently rise to an unacceptable level, then it is 
possible, for example with the aid of neural networks 
having a greatly simplified structure, to make a 
preliminary estimation of the significance of the 
individual obj ectif iable items of information for the 
meaningfulness of the overall system. It should however 
be stressed that this preliminary estimate is also 
performed automatically and without the intervention of 
the respective operator. Furthermore, the number of 
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output neurons is chosen to be large enough that, for the 
purposes of a series expansion of the survival function, 
a sufficient number of series expansion terms are 
available to achieve a meaningful approximation to the 
actual survival function. Finally, the number of 
intermediate neurons is chosen to be large enough that 
the results of the trained neural network are meaningful, 
but small enough that the time required to determine the 
result is acceptable. 

5.2, Function of neural networks 

5.2.1, General 

Each neuron receives a stimulation signal S, 
processes it in accordance with a predetermined 
activation function F(S) and outputs a corresponding 
response signal A = F(S) which is fed to all neurons 
located below said neuron. The stimulation signal Sy that 
acts on the neuron Ny in question is usually formed by 
summing the response signals Ax of the neurons Nx located 
above said neuron Ny, with the contributions of the 
individual neurons Nx in each case being factored with a 
weighting factor Wxy that states the strength of the 
synapse connecting the two neurons into the sum. 

Stimulation signal: Sy = Dx Wxy-Ax 
Response signal: Ay = F (Sy) 

5.2.2. Input layer 

The stimulation signals Si of the input neurons Ni 
are formed by the input data Xi,j relating to a particular 
patient j . 

Stimulation signal: Si = Xi,j 



In order to be able to interpret the weights of 
the synapses of a neural network appropriately, it is 
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preferable to work with variables whose values are of the 
magnitude of 1 . To achieve this despite the usually very- 
different distributions of input data, it is customary to 
subject the input data to an appropriate transf oimiation. 
Said transformation is performed by the activation 
function Fi of the input neurons: 

Response signal: Ai = tanh[(Si - Si, mean) /Si, q] 

For the input data Xi,j, therefore, firstly the 
mean value Si, mean of the patients j belonging to the 
training dataset is formed. Secondly a scaling factor Si,Q 
is formed. If the value of an input variable Xi,j is above 
the mean Si, mean/ then scaling is performed in accordance 
with the 75% quartile. If, on the contrary, it is below 
the mean value, then scaling is performed in accordance 
with the 25% quartile. Finally, by using the hyperbolic 
tangent function as the activation function Fi, scaled 
response signals with values in the range between -1 and 
+1 are readily obtained. 

Note that the above transformation can be omitted 
for input data that already exhibit the desired 
distribution, categorical values or binary values. 

5.2.3. Intermediate layer 

The stimulation signal Sh for the neurons Nh of the 
intermediate layer is formed by the weighted sum of the 
response signals Ai of all neurons Ni of the input layer: 

Stimulation signal: Sh = Di Wih-Ai 

Said stimulation signal Sh is transformed by the 
neurons Nh in accordance with a given activation function 
Fh/ which may again be the hyperbolic tangent function for 
example, into a response signal Ah: 



Response signal: 



Ah = Fh(Sh - bh) 
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In the field of neural networks, the parameters 
bh are referred to as the "bias" of the ^respective neuron. 
Like the values of the synapse weights Wxy, the values of 
said bias parameters bh are also determined during 
training of the neural network. 

5.2.4. Output layer 

The stimulation signal So and the response signal 
Ao for a neuron No of the output layer are determined 
analogously: 

Stimulation signal: So = Si Wio- (Ai - Ci)+ Sj, Who'Ah 

Response signal: Ao = Fo(So - bo) 

The parameters bo again indicate the "bias" of the 
neurons N© of the output layer, while the parameters ci 
serve to adapt the stimulation contributions of the 
neurons Ni of the input layer and Nh of the intermediate 
layer. The values of both the parameters bo and the 
parameters Ci are determined during the training phase of 
the neural network. With respect to the bias values bo, it 
may be favorable to reqiaire as a constraint that the 
response of all output neurons No averaged across the 
complete training dataset is zero. The identity function 
Fo{x) = X can be used as the activation function Fo for 
most applications, in particular for the present case 
where the su^rvival function is being detezroined for 
cancer patients . 

The response signals Aq of the output neurons No 
indicate the respective coefficients of the associated 
terms of the series expansion of the survival function 
sought . 

5.3, The survival function 

As already mentioned above, the input data 
comprise information about the patient's personal data as 
well as information about the medical condition. All 
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these data are captured at a time t = 0, in the case of 
cancer patients the time of the first operation for 
example. Following the first operation, the patients then 
undergo a particular follow-up treatment, which may 
include chemotherapy and/or radiotherapy for example. 

The survival function S(t) indicates for a 
patient in question at a time t the probability that a 
particular event has not yet occurred. Said particular 
event may be, for example, a recurrence of cancer, or 
also in the worst case the death of the patient. In any 
case, S(0) = 1 holds for the survival function. In 
addition, S(~) = 1 is usually assumed. 

According to conventional notation, it is 
possible to define an event density f{t) and a risk 
function A(t) on the basis of the survival function S(t): 

f(t) = -dS/dt 

A(t) = f(t)/S(t) 

from which it follows that: 

A(t) = -(d/dt) [In S(t) ] 

If one knows the curve of the risk function A(t) 
therefore, it is possible to reconstruct the curve of the 
survival function S(t) by means of integration. 

The task of the neural network is to model the 
curve of the risk function A(t) in the same way as a 
series expansion: 

A(t) = Ao-exp[Io Bo(t)-Ao] 

According to the above notation, the parameters 
Ao denote the response signals of the neurons No of the 
output layer of the neural network. In the context of the 
present invention, Ao is a parameter independent of t 
which is used as a scaling factor. Bo(t) denotes a set of 
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functions that, as base functions pf the series 
expansion, enable a good approximation to the actual 
cuirve of the risk function. It is possible to use for 
example the fractal polynomials or else functions such as 
t^ (where p is not necessarily an integer) as the function 
set Bo(t) . Boi(t) = 1; Bo2(t) = const •t^''^, ... were used for 
the present invention. 

5.4, Training of the neural network - preparations 

5.4.1. The optimization function 

The training dataset comprises the data records 
of a plurality of patients for whom not only personal 
data and information about the medical condition/ but 
also information about the type of follow-up treatment 
and the further progress of the disease are known. From 
the collected data relating to the further progress of 
the disease, an "actual survival function" is constructed 
according to the following rules: if the predetermined 
event, for example a relapse or the death of the patient, 
has already occurred for a particular patient at a time 
t, then his contribution 5 to the "actual survival 
function" is set to 5 = 0 before time t and to 5 = 1 at 
time t and after time t. Patients for whom the 
predetermined event has not yet occurred at the time the 
training dataset was created ("censored" data) contribute 
only 6 = 0 to the "actual survival function" at all 
times. During the training phase the weights Wxy of the 
synapses and the other optimization parameters set out in 
section 5.2. above are then set in such a way that the 
survival function delivered by the neural network 
optimally matches the "actual survival function". 

This can be achieved, for example, by defining a 
suitable optimization function O for this purpose and 
searching for a local, in the most favorable case even 
the global, minimum of said optimization function in the 
space covered by the optimization parameters. To define 
the optimization function O, it is already known in the 
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prior art to start from a so-called like.lihood function 
L: 

O = -In L 

According to the invention 

L = Hj [fj(t)]^ • [Sj(t)]"-^ 

is chosen to represent the likelihood function where, in 
accordance with the notation introduced in section 5.3., 
fj(t) and Sj (t) denote the event density and the survival 
function for the patient j of the training set. Said 
likelihood function has the advantage that the 
computational effort rises only approximately 
proportionately to the number of patients included in the 
training dataset - 

Another way of representing the likelihood 
function is: 

L = rij < expCDo Bo(t) -Aoj] 
n Di expCDi Bo(t) -Aoi] > 

where the product is formed across all patients j for 
whom the predetermined event has already occurred at time 
t, and where the first sum in the denominator of the 
quotient is formed across all patients 1 for whom the 
predetermined event has not yet occurred at time t. 

The computational effort associated with this 
representation does however rise approximately 
proportionately to the square of the number of patients 
included in the training dataset . 

5.4.2. The initialization 

As is known per se in the prior art, to 
initialize the network optimization parameters, for 
example the weights of the synapses connecting the 
neurons, it is possible to assign stochastically to said 
parameters small values that conform to certain 
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normalization rules. It is additionally possible here to 
include in the normalization findings obtained in 
preliminary test runs on neural networks having a greatly 
simplified structure , 

5>5> Training the neural network - simplex method 

As is customary, the search for a local or the 
global minimum of the optimization function is performed 
in several steps or cycles. According to the invention, 
however, for the first time the simplex method proposed 
by Nelder and Mead (see section 6. "References") is used 
in a neural network for this search. A simplex is an 
(n+1) -dimensional structure in an n-dimensional space 
which surrounds the current basepoint in the 
n-dimensional space, i.e. a triangle in 2 -dimensional 
space, a tetrahedron in a 3 -dimensional space and so 
forth. In what directions and at what distances from the 
current basepoint the (n+1) vertices are arranged is 
determined here from the vertices of the preceding cycle 
on the basis of the characteristics of the optimization 
function . 

This method leads to a strictly monotonic 
decreasing sequence of basepoints. It can be continued 
until either (within given precision limits) a local or 
global minimum has been identified or another termination 
criterion has been fulfilled. In connection with said 
further termination criterion, the abovementioned 
validation dataset now comes into play: 

The a.bovementioned monotonic decrease in 
basepoints can arise on the one hand from actually 
objectif iable characteristics of the optimization 
function specified for the training dataset. On the other 
hand, it is also possible that the decrease occurs in the 
range of a valley of the optimization function caused by 
stochastic fluctuations. The latter effect however only 
simulates a learning success. For this reason, according 
to the invention the characteristics of the optimization 
function specified on the basis of the validation dataset 
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are also investigated at the same basepoints. If it is 
then detearmined that the basepoints of the "validation 
data optimization function" also exhibit a. monotonic 
decrease, then it can be assiamed that one is still in a 
"true" learning phase of the neural network. If on the 
other hand the sequence of basepoints of the "validation 
data optimization function" stagnates, or if it even 
rises again, it must be assumed that with respect to the 
"training data optimization function" one is in a valley 
caused by stochastic fluctuations which only simulates a 
learning progress. The cyclical execution of the simplex 
method can therefore be interrupted. 

The main advantage of the simplex method is that 
it can be performed solely on the basis of the 
optimization function, and also that the step length and 
step direction can be automatically specified. 

5.6. Training the neural network - structure simplific- 
ation ("pmning") 

Once the search for a local or the global minimum 
has been completed, the next training step is to 
investigate whether it is possible to simplify the 
structure of the neural network on the basis of the 
findings so far. This "pruning" is concerned with 
investigating which of the synapses have so little 
influence on the overall function of the neural network 
that they can be omitted. In the simplest case this can 
be, for example, permanently setting the weight assigned 
to them to zero. However, in principle it is equally 
conceivable to "freeze" the weight of the respective 
synapse to a fixed value. It is advantageous to alternate 
between simplex optimization steps and structure 
simplification steps in an iterative process. It would of 
course be desirable for the neural network to undergo a 
new simplex optimization after a single synapse has been 
excluded. In view of the total time required for training 
however, this is unjustifiable. In practice a favorable 
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compromise has proved to be the removal during a 
structure simplification step of at most 10% of the 
synapses still present at the beginning of said step. 

According to the invention the two methods 
described below in sections 5.6.1. and 5.6.2. are used 
for structure simplification. 

5.6.1. Likelihood method 

With this method the value of the likelihood 
function is first calculated as a reference value on the 
basis of the complete structure of the neural network in 
its present state of training, i.e. using the current 
values of the weights of all synapses. Following this, 
the influence of a given synapse is suppressed, i.e. the 
value of the weight of this synapse is set to zero. The 
value of the likelihood function is then calculated for 
the thus simplified network structure, and the ratio of 
this value to the reference value is formed. 

Once said likelihood ratio has been calculated 
for all synapses, when performing the steps described 
below, a start is made with the synapse for which the 
value of the likelihood ratio is nearest to one: 

Assuming that the network structure has already 
been simplified by (x-1) synapses and the significance of 
the xth synapse is now being investigated, then the 
following three network structure variants are compared: 
firstly the complete structure of the neural network in 
its current state of training with all synapses still 
present prior to this structure simplification state, 
secondly the network structure excluding the (x-1) 
synapses already suppressed in this structure 
simplification step, and thirdly the network structure 
now also excluding the xth synapse. Following this, using 
a significance test the third structure variant is 
compared firstly with the first structure variant 
(complete structure) and secondly with the second 
structure variant ( (x-1) synapses suppressed) . If even 
just one of the two tests produces too great a deviation 
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from the third structure variant, then . the respective 
synapse is retained at least for the next simplex 
optimization step . 

The CHI -SQUARED test (of. section 6. 
"References", Document .....) which is known per se can 
be used as a significance test for example. 
Alternatively, said significance test could also be 
performed using the BOOT- STRAPPING method (cf. section 6. 

"References", Document ) which is likewise known per 

se. The use of the CHI -SQUARED test is particularly 
favorable if the reaction of the neural network is 
determined on the basis of a likelihood function. The 
BOOT- STRAPPING method is also suitable with other types 
of functions for representing the reaction of the neural 
network. 

5.6.2. Correlation method 

The exclusion or suppression of synapses 
according to the correlation method is based on the 
consideration that it could be possible for two neurons 
located on one and the same layer to have qualitatively 
the same influence on one neuron on a lower layer. In 
this case the reaction of the neural network, or to be 
more precise the response signal of said latter neuron, 
should not change significantly if said neuron is 
stimulated by only one of the two neurons located above 
it, and the influence of the second neuron is taken into 
account by strengthening the remaining synapse. It would 
then be possible to omit the synapse leading from the 
second neuron to the neuron in question. 

a. Synapses coxmectlng input neurons and output neurons 

In accordance with section 5.2.4., the 
contribution of the response signal of two input neurons 
to the stimulation signal of an output neuron takes the 
form: 

So = wio- (Ai - Ci) + W20- (A2 - C2) 
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If one then assumes that the two response signals 
Ai and A2 are correlated at least approximately to one 
another in accordance with 

A2 = m-Ai + n 

and that the weight Wio is greater than the weight W20/ 
then the following holds for the stimulation signal Sq: 

So = (Wio + W2o-m) -Ai + (n-W2o - Wio'Ci - W2o'C2) 

= w*io- (Ai - c*i) 

where 

w*io = wio + W2o'm 

and 

c*i = -[(n-W2o - wio-ci - W2o'C2) ] / (wio + W2o-Tn) 

If w*io is non-small, the behavior of the neural 
network can be tested with the following assumptions: 

1. Replace the weight Wio by w*io; 

2- Replace the parameter Ci by c*i; and 

3 . Suppress the synapse from the input neuron N2 to the 
output neuron No- 

If the outcome of this test, which can again be 
performed as a CHI -SQUARED test for example, is positive, 
then it is possible to omit the synapse from the input 
neuron N2 to the output neuron No. 

b. Synapses connecting Input neurons and intermediate 
neurons 

The contribution of the response signal of two 
input neurons to the stimulation signal of an 
intermediate neuron can also be treated analogously, in 
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which case it is advisable, for reasons that will become 
immediately apparent below, to treat the stimulation 
signal of the intermediate neuron including its "bias": 

Sh - bh = Wih-Ai + W2h-A2 

If one again assumes that the two response 
signals Ai and A2 are correlated at least approximately to 
one another in accordance with 

A2 = m-Ai + n 

and that the weight Wih is greater than the weight W2h/ 
then the following holds for the stimulation signal Sh: 

Sh - bh = (wih + W2h-ni) -Ai + n"W2h 

or 

Sh - b*h = w*ih-Ai 

where 

w*ih = wih + W2h-m 

and 

b*h = bh + n-W2h 

If w*ih is non-small, the behavior of the neural 
network can be tested with the following assumptions: 

1. Replace the weight wih by w*ih; 

2. Replace the bias bh by b*h; and 

3 . Suppress the synapse from the input neuron N2 to the 
intermediate neuron Nh- 

If the outcome of this test, which can again be 
performed as a CHI -SQUARED test for example, is positive, 
then it is possible to omit the synapse from the input 
neuron N2 to the intermediate neuron Nh. 
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c. Synapses connecting lntermedlat:e neurons and output 
neurons 

Synapses leading from intermediate neurons to 
output neurons can also be treated analogously. With 
respect to the bias values bo, however, the further 
constraint mentioned in section 5.2.4. may need to be 
taken into account. 

5.6.3. Testing the topology 

The above-described pruning of the structure of 
the neural network can result in individual neurons no 
longer being connected to any other neurons . This is the 
case for example if an input neuron is not connected to 
any intermediate neuron nor to any output neuron, or if 
an output neuron is not connected to any intermediate 
neuron nor to any input neuron. It is therefore only 
logical to completely deactivate these neurons that no 
longer have an influence on the function of the neural 
network . 

Intermediate neurons that are still connected to 
neurons on the input layer but not to neurons on the 
output layer constitute a special case. Said intermediate 
neurons can no longer exert any influence on the function 
of the neural network. The synapses leading from the 
input layer to these intermediate neurons can therefore 
also be suppressed, i.e. the weights of said synapses can 
be set to zero. 

The converse case can however also occur, namely 
that an interraediate neuron is still connected to the 
output layer, but no longer has any connection to the 
input layer. At best said intermediate neurons can output 
to the output neurons a response signal that is dependent 
on their "bias". However, a signal of this type has no 
information content whatsoever that would be significant 
for the function of the neural network. It is therefore 
also possible to suppress the remaining synapses of said 
intermediate neurons . 
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5 , 7 > Generalization 

On completion of the training phase it. is 
necessary to test the performance of the trained neural 
network to obtain a measure of how meaningful the 
survival functions delivered by this neural network 
actually are. The abovementioned generalization dataset, 
which had no influence whatsoever on the training of the 
neural network and thus enables objective results, is 
used for this purpose. 

5,8. Concluding remarks 

In conclusion it should be mentioned that^ in 
addition to the tumor-specific factors upA and PAI-1. 
explicitly mentioned above which allow statements to. be 
made about invasion, it is also possible to take further 
such factors into account. Among others, these include 
factors for proliferation, for example the S phase and 
Ki-67, and other processes that influence tumor growth. 
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